Using Latent Semantic Analysis of Email to Detect Change in Social Groups

نویسندگان

  • Ian McCulloh
  • Eric Daimler
  • Kathleen M. Carley
چکیده

Email text data is a rich resource that, when properly used, may enhance warning of economically material events in commercial enterprises. Armed with the temporal text classification of an email time stamp on a large data set, we combine latent semantic analysis with statistical process control, to detect potential change in sentiment. A novel approach to identifying the causes of change is proposed by averaging concept scores across statistically significant time periods and then performing an inverse singular valued decomposition. The resulting term by significant time period matrix, is used to explore potential causes of change and are compared to historical events. Our findings suggest that causes for significant change in semantic intent can be identified in some circumstances. On a dataset of 50,000 unique emails from Enron, we demonstrate this novel approach to investigate semantic change in email text data.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Query expansion based on relevance feedback and latent semantic analysis

Web search engines are one of the most popular tools on the Internet which are widely-used by expert and novice users. Constructing an adequate query which represents the best specification of users’ information need to the search engine is an important concern of web users. Query expansion is a way to reduce this concern and increase user satisfaction. In this paper, a new method of query expa...

متن کامل

phishGILLNET—phishing detection methodology using probabilistic latent semantic analysis, AdaBoost, and co-training

Identity theft is one of the most profitable crimes committed by felons. In the cyber space, this is commonly achieved using phishing. We propose here robust server side methodology to detect phishing attacks, called phishGILLNET, which incorporates the power of natural language processing and machine learning techniques. phishGILLNET is a multi-layered approach to detect phishing attacks. The ...

متن کامل

The Symbiosis of Human and Semantic Technology Through the Lens of Actor-Network Theory

Background:  Semantic technologies (STs) have made machine reasoning possible by providing intelligent data management methods. This capability has created new forms of interaction between humans and STs, which is called "semantic interaction."  The increasing spread of this form of interaction in daily life reveals the need to identify the factors affecting it and introduce the requirements of...

متن کامل

Finding Groups of People in Google News

In this paper, we study the problem of content-based social network discovery among people who frequently appear in world news. Google news is used as the source of data. We describe a probabilistic framework for associating people with groups. A low-dimensional topic-based representation is first obtained for news stories via probabilistic latent semantic analysis (PLSA). This is followed by c...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2008